Add SQLite state store for persistent resources#142
Conversation
Add a local SQLite-backed state store for persistent process metadata and cross-instance resource leases. Wire the store into controller startup and persistent executable/container/network coordination, and manage schema updates through embedded golang-migrate migrations with compatibility authoring guidance.\n\nCo-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
b86aa92 to
5ee3058
Compare
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
karolz-ms
left a comment
There was a problem hiding this comment.
Still reviewing, but I thought I will share this with you now
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove persisted lease expiration state and use updated-at only for leaseholder revalidation. Also make SQLite file DSNs compatible with Windows drive-letter paths. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add a GetLeaseKey interface for resource lease operations and implement it on Container and ContainerNetwork. Remove the redundant statestore key helpers and normalize touched file headers. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Describe persistent executable API and state-store behavior at a high level instead of copying source declarations into docs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove branch-oriented change wording so the documentation describes executable persistence as stable behavior. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Coordinate persistent Executable reconciliation through resource leases, use Executable.GetLeaseKey for process records, and add interface guards for leasable API resources. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
karolz-ms
left a comment
There was a problem hiding this comment.
Wow, that was a lot. 😆 Looks awesome overall
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Only thing that's missing for now that we'll likely end up wanting is a setting to allow a persistent resource to be cleaned up when a process with a given PID exits. That'll let us handle cases where we want a semi-long lived process or container that's still cleaned up automatically when some parent resource exits (such as cleaning up when the IDE exits or |
There was a problem hiding this comment.
Pull request overview
Adds a local SQLite-backed state store to persist DCP coordination metadata (persistent process records + cross-instance resource leases) and wires it into controller startup plus persistent Executable/Container/Network reconciliation.
Changes:
- Introduce
internal/statestore(SQLite open/config, embedded golang-migrate migrations, persistent process records, and resource leases with stale-lease cleanup). - Add Executable persistence support (new API fields, adoption flow, runner adoption interface, and controller logic to persist/adopt/stop stale records).
- Coordinate persistent Container/Network creation via state-store-backed leases; update integration test environments to provision isolated state stores.
Reviewed changes
Copilot reviewed 44 out of 48 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| test/integration/test_env_common.go | Adds helper to create/cleanup an isolated SQLite state store for integration tests. |
| test/integration/standard_test_env.go | Wires state store + lease owner into standard integration controller startup. |
| test/integration/service_controller_test_not_darwin.go | Normalizes build tag/header placement (no functional change). |
| test/integration/advanced_test_env.go | Wires state store + lease owner into advanced integration controller startup. |
| README.md | Documents DCP_STATE_STORE_PATH environment variable. |
| pkg/generated/openapi/zz_generated.openapi.go | Regenerates OpenAPI schema to include new ExecutableSpec fields. |
| internal/testutil/ctrlutil/test_process_executable_runner.go | Adds AdoptRun passthrough to support persistent run adoption in tests. |
| internal/testutil/ctrlutil/apiserver_start.go | Sets DCP_STATE_STORE_PATH for test-started apiserver instances. |
| internal/statestore/store.go | Implements SQLite store open/configure (WAL, busy timeout, restricted perms) and helpers. |
| internal/statestore/store_test.go | Adds coverage for DSN formatting, migrations, leases, and persistent process CRUD. |
| internal/statestore/schema.go | Embeds migrations and applies them with golang-migrate under a lockfile. |
| internal/statestore/processes.go | Implements persistent process record upsert/get/delete. |
| internal/statestore/migrations/README.md | Adds authoring/compatibility rules for SQLite migrations. |
| internal/statestore/migrations/000001_initial.up.sql | Initial schema for resource_locks and persistent_processes. |
| internal/statestore/leases.go | Implements cross-process resource leases, held-lease errors, and stale cleanup. |
| internal/statestore/lease_owner.go | Defines resource lease owner identity based on PID + identity time. |
| internal/proxy/network_stream_windows.go | Normalizes build tag/header placement (no functional change). |
| internal/proxy/network_stream_unix.go | Normalizes build tag/header placement (no functional change). |
| internal/networking/networking_windows.go | Normalizes build tag/header placement (no functional change). |
| internal/networking/networking_linux.go | Normalizes build tag/header placement (no functional change). |
| internal/networking/networking_darwin.go | Normalizes build tag/header placement (no functional change). |
| internal/exerunners/process_executable_runner.go | Updates process runner to support persistent runs (no monitor/kill-on-dispose) + AdoptRun. |
| internal/exerunners/process_executable_runner_test.go | Adds test ensuring persistent executables skip process monitoring. |
| internal/dcpctrl/commands/run_controllers.go | Opens state store at controller startup, cleans stale leases, passes store/config into reconcilers. |
| go.mod | Adds dependencies for golang-migrate and modernc SQLite. |
| go.sum | Updates module sums for new dependencies. |
| doc/executable-persistence-api.md | Adds API/documentation for start, persistent, lifecycle keys, and state store behavior. |
| controllers/service_controller_metrics.go | Formatting/header normalization (no functional change). |
| controllers/resource_lease_logging.go | Adds helper to log held-lease details consistently. |
| controllers/network_controller.go | Adds NetworkReconciler config + uses resource leases for persistent network operations. |
| controllers/executable_start_result.go | Adds process identity/display start timestamps to startup results and run info propagation. |
| controllers/executable_runner.go | Introduces PersistentExecutableRunner + adoption info contract. |
| controllers/executable_run_info.go | Stores process identity/display start timestamps on run info. |
| controllers/executable_persistence.go | Implements persistence lifecycle key computation, record upsert/delete, and adoption logic. |
| controllers/executable_controller.go | Wraps persistent reconciles in a resource lease; adopts/persists/deletes records for persistent executables. |
| controllers/executable_controller_test.go | Adds unit tests for lifecycle key behavior and adoption flows. |
| controllers/endpoint_common.go | Formatting/header normalization (no functional change). |
| controllers/controller_common.go | Adds shared resourceLeaseRevalidationInterval constant. |
| controllers/container_controller.go | Refactors startup flow; uses state-store resource leases to coordinate persistent container creation/reuse. |
| api/v1/zz_generated.deepcopy.go | Regenerates deepcopy for new ExecutableSpec fields. |
| api/v1/executable_types.go | Adds start, persistent, lifecycleKey, lifecycle-key helper, validation, ShouldStart, and lease key. |
| api/v1/executable_types_test.go | Adds tests for Start semantics, lease keys, and update validation. |
| api/v1/container_types.go | Adds lease key support for persistent container coordination. |
| api/v1/container_types_test.go | Adds tests for Container/ContainerNetwork lease keys. |
| api/v1/container_network_types.go | Adds lease key support for persistent network coordination. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 48 out of 52 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
controllers/executable_controller.go:549
- In the previous-start-attempt failure path, the code overwrites a non-nil StartupError with the generic "unknown" error. This looks inverted: it should only fall back to the generic error when StartupError is nil, otherwise the real error is lost and logs become misleading.
runErr := startResult.StartupError
if runErr != nil {
runErr = unknownStartupFailureErr
}
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Treat persistent network lease configuration failures as terminal startup errors instead of retrying indefinitely, and close process run output files when persistent runs are released. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove the point-in-time SQLite file restriction pass and let SQLite own database and sidecar file creation. The state store now only ensures the parent directory exists before opening the database. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove unused metadata from resource leases and persistent process records while this schema is still in draft. Persistent process records now keep only durable process state, and lifecycle diagnostics sort changed executable environment names for stable logs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Rename the lease contention log field from LeaseOwnerStartTime to LeaseOwnerIdentityTime because the value is a process identity discriminator and is not necessarily a wall-clock start time. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| processCtx := ctx | ||
| if exe.Spec.Persistent { | ||
| creationFlags = process.CreationFlagsNone | ||
| processCtx = context.WithoutCancel(ctx) |
There was a problem hiding this comment.
Persistent runs detach from DCP lifetime here, but stdout/stderr above are still TimestampWriters owned by this controller process. exec.Cmd pumps child output through parent-owned pipes into those writers; when DCP exits while the persistent child survives, those pipes go away, so later child output can hit broken pipe/terminate or stop being captured. Persistent processes should redirect directly to log files or another sink that outlives DCP before using WithoutCancel/CreationFlagsNone.
There was a problem hiding this comment.
Switched to a bare output (without TimestampWriter) for persistent executables for now so that we can at least preserve output across sessions.
I think long term we should introduce a new DCP sub-command for handling this output mapping; we'd use it to launch the actual process and have it handle writing timestamp entries to the log files. It'd match its child's lifetime so as to make sure everything gets cleaned up whenever the child process goes away. That's about the only way I can think of to give timestamp output for persistent executables that'd be reliable across runs.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| dir, dirErr := persistentExecutableOutputDir() | ||
| if dirErr != nil { | ||
| return nil, fmt.Errorf("could not determine persistent Executable output directory: %w", dirErr) | ||
| } | ||
| if mkdirErr := os.MkdirAll(dir, osutil.PermissionOnlyOwnerReadWriteTraverse); mkdirErr != nil { | ||
| return nil, fmt.Errorf("could not create persistent Executable output directory '%s': %w", dir, mkdirErr) | ||
| } | ||
| if chmodErr := os.Chmod(dir, osutil.PermissionOnlyOwnerReadWriteTraverse); chmodErr != nil { | ||
| return nil, fmt.Errorf("could not restrict persistent Executable output directory '%s': %w", dir, chmodErr) | ||
| } | ||
|
|
||
| return usvc_io.OpenFile(filepath.Join(dir, fileName), os.O_RDWR|os.O_CREATE|os.O_EXCL, osutil.PermissionOnlyOwnerReadWrite) |
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| func ensureStateStoreDir(path string) error { | ||
| parentDir := filepath.Dir(path) | ||
| if mkdirErr := os.MkdirAll(parentDir, osutil.PermissionOnlyOwnerReadWriteTraverse); mkdirErr != nil { | ||
| return fmt.Errorf("could not create state store directory '%s': %w", parentDir, mkdirErr) | ||
| } | ||
| return nil |
| func validateRestrictedDirectoryOwner(os.FileInfo) error { | ||
| return nil |
Uh oh!
There was an error while loading. Please reload this page.